Health Check Monitor
A health check monitor fetches a JSON endpoint that reports on multiple sub-checks within your application. This is ideal for monitoring internal services like database connections, cache availability, queue health, and disk space from a single endpoint.
How It Works
Beepr fetches your health check URL every minute and parses the JSON response. Based on the sub-check statuses, Beepr determines the overall monitor status:
| Condition | Status | Creates Incident |
|---|---|---|
All checks ok | Up (green) | No |
Any check warning | Degraded (yellow) | No |
Any check failed or crashed | Down (red) | Yes |
finishedAt timestamp too old | Stale (yellow) | No |
When an incident is created, the failing sub-checks are listed in the incident timeline with their notification messages.
Response Format
Your endpoint must return JSON with this structure:
{
"finishedAt": "1638879833",
"checkResults": [
{
"name": "Database",
"label": "Database Connection",
"status": "ok",
"notificationMessage": "",
"shortSummary": "connected"
},
{
"name": "Redis",
"label": "Redis Cache",
"status": "ok",
"notificationMessage": "",
"shortSummary": "available"
},
{
"name": "DiskSpace",
"label": "Disk Space",
"status": "warning",
"notificationMessage": "Disk usage at 85%",
"shortSummary": "85%"
}
]
}
Top-Level Fields
| Field | Type | Required | Description |
|---|---|---|---|
finishedAt | string or integer | No | Unix timestamp (seconds) when the checks were last run. Used for staleness detection. |
checkResults | array | Yes | Array of check result objects |
Check Result Fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique identifier for the check |
status | string | Yes | One of: ok, warning, failed, crashed, skipped |
label | string | No | Human-readable name displayed in the UI |
notificationMessage | string | No | Detailed message shown in incidents when the check fails |
shortSummary | string | No | Brief status text for the dashboard (e.g., "45%", "connected") |
meta | object | No | Additional metadata (not displayed, but stored) |
Status Values
| Status | Meaning | Effect |
|---|---|---|
ok | Check passed | Contributes to "up" status |
warning | Check has a non-critical issue | Sets monitor to "degraded" (no incident) |
failed | Check failed | Sets monitor to "down" and creates incident |
crashed | Check crashed unexpectedly | Sets monitor to "down" and creates incident |
skipped | Check was skipped | Ignored in status calculation |
Staleness Detection
If your health check endpoint caches results or runs checks periodically (not on every request), the finishedAt timestamp helps detect stale data.
Configure the staleness threshold (1-60 minutes) when creating the monitor. If the finishedAt timestamp is older than the threshold, the monitor status changes to "stale".
This is useful for:
- Endpoints that cache health check results
- Background jobs that update health status periodically
- Detecting when your health check system itself has stopped running
Example Implementations
Elixir/Phoenix
defmodule MyAppWeb.HealthController do
use MyAppWeb, :controller
def index(conn, _params) do
checks = [
check_database(),
check_redis(),
check_disk_space()
]
json(conn, %{
finishedAt: DateTime.utc_now() |> DateTime.to_unix() |> to_string(),
checkResults: checks
})
end
defp check_database do
case MyApp.Repo.query("SELECT 1") do
{:ok, _} ->
%{name: "Database", label: "Database Connection", status: "ok", shortSummary: "connected"}
{:error, reason} ->
%{name: "Database", label: "Database Connection", status: "failed",
notificationMessage: "Database connection failed: #{inspect(reason)}"}
end
end
defp check_redis do
case Redix.command(:redix, ["PING"]) do
{:ok, "PONG"} ->
%{name: "Redis", label: "Redis Cache", status: "ok", shortSummary: "available"}
_ ->
%{name: "Redis", label: "Redis Cache", status: "failed",
notificationMessage: "Redis not responding"}
end
end
defp check_disk_space do
{output, 0} = System.cmd("df", ["-h", "/"])
# Parse output and check percentage
used_percent = parse_disk_usage(output)
cond do
used_percent >= 90 ->
%{name: "DiskSpace", label: "Disk Space", status: "failed",
notificationMessage: "Disk usage critical at #{used_percent}%",
shortSummary: "#{used_percent}%"}
used_percent >= 80 ->
%{name: "DiskSpace", label: "Disk Space", status: "warning",
notificationMessage: "Disk usage high at #{used_percent}%",
shortSummary: "#{used_percent}%"}
true ->
%{name: "DiskSpace", label: "Disk Space", status: "ok",
shortSummary: "#{used_percent}%"}
end
end
end
Node.js/Express
app.get('/health', async (req, res) => {
const checks = await Promise.all([
checkDatabase(),
checkRedis(),
checkDiskSpace()
]);
res.json({
finishedAt: Math.floor(Date.now() / 1000).toString(),
checkResults: checks
});
});
async function checkDatabase() {
try {
await db.query('SELECT 1');
return { name: 'Database', label: 'Database Connection', status: 'ok', shortSummary: 'connected' };
} catch (error) {
return { name: 'Database', label: 'Database Connection', status: 'failed',
notificationMessage: `Database error: ${error.message}` };
}
}
PHP/Laravel
Route::get('/health', function () {
$checks = [
checkDatabase(),
checkRedis(),
checkDiskSpace(),
];
return response()->json([
'finishedAt' => (string) time(),
'checkResults' => $checks,
]);
});
function checkDatabase(): array {
try {
DB::select('SELECT 1');
return ['name' => 'Database', 'label' => 'Database Connection',
'status' => 'ok', 'shortSummary' => 'connected'];
} catch (Exception $e) {
return ['name' => 'Database', 'label' => 'Database Connection',
'status' => 'failed', 'notificationMessage' => $e->getMessage()];
}
}
Libraries
Several open-source libraries can generate compatible health check responses:
- PHP: spatie/laravel-health
- Ruby: health_check
- Python: py-healthcheck
For other languages, implement an endpoint that returns the JSON format described above.
Best Practices
- Keep checks fast: Health check endpoints should respond quickly. Avoid expensive operations.
- Use meaningful names: The
namefield should be unique and descriptive for easy identification. - Include actionable messages: When a check fails, the
notificationMessageshould help diagnose the issue. - Set appropriate thresholds: Use
warningstatus for early warnings before things become critical. - Secure the endpoint: Consider adding authentication or IP restrictions if the health check reveals sensitive information.